Sampling Techniques for Zero-sum, Discounted Markov Games

نویسندگان

  • Uday Savagaonkar
  • Edwin K. P. Chong
  • Robert L. Givan
چکیده

In this paper, we first present a key approximation result for zero-sum, discounted Markov games, providing bounds on the state-wise loss and the loss in the sup norm resulting from using approximate Q-functions. Then we extend the policy rollout technique for MDPs to Markov games. Using our key approximation result, we prove that, under certain conditions, the rollout technique gives rise to a policy that is closer to the Nash equilibrium than the base policy. We also use our key result to provide an alternative analysis of a second sampling approach to Markov games known as sparse sampling. Our analysis implies the (already known) result that, under certain conditions, the policy generated by the sparse-sampling algorithm is close to the Nash equilibrium. We prove that the amount of sampling that guarantees these results is independent of the state-space–size of the Markov game.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sampling Techniques for Markov Games Approximation Results on Sampling Techniques for Zero-sum, Discounted Markov Games

We extend the “policy rollout” sampling technique for Markov decision processes to Markov games, and provide an approximation result guaranteeing that the resulting sampling-based policy is closer to the Nash equilibrium than the underlying base policy. This improvement is achieved with an amount of sampling that is independent of the state-space size. We base our approximation result on a more...

متن کامل

Learning Nash Equilibrium for General-Sum Markov Games from Batch Data

This paper addresses the problem of learning a Nash equilibrium in γ-discounted multiplayer general-sum Markov Games (MGs) in a batch setting. As the number of players increases in MG, the agents may either collaborate or team apart to increase their final rewards. One solution to address this problem is to look for a Nash equilibrium. Although, several techniques were found for the subcase of ...

متن کامل

Magnifying Lens Abstraction for Stochastic Games with Discounted and Long-run Average Objectives

Turn-based stochastic games and its important subclass Markov decision processes (MDPs) provide models for systems with both probabilistic and nondeterministic behaviors. We consider turnbased stochastic games with two classical quantitative objectives: discounted-sum and long-run average objectives. The game models and the quantitative objectives are widely used in probabilistic verification, ...

متن کامل

Structural approximations in discounted semi-Markov games

We consider the problem of approximating the values and the equilibria in two-person zero-sum discounted semi-Markov games with in nite horizon and compact action spaces, when several uncertainties are present about the parameters of the model. Speci cally: on the one hand, we study approximations made on the transition probabilities, the discount factor and the reward functions when the state ...

متن کامل

Fast Planning in Stochastic Games

Stochastic games generalize Markov decision processes (MDPs) to a multiagent setting by allowing the state transitions to depend jointly on all player actions, and having rewards determined by multiplayer matrix games at each state. We consider the problem of computing Nash equilibria in stochastic games, the analogue of planning in MDPs. We begin by providing a generalization of nite-horizon v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007